Exploiting prosody for PCFGs with latent annotations

نویسندگان

  • Markus Dreyer
  • Izhak Shafran
چکیده

We propose novel methods for integrating prosody in syntax using generative models. By adopting a grammar whose constituents have latent annotations, the influence of prosody on syntax can be learned from data. In one method, prosody is utilized to seed the latent annotations of a grammar which is then refined using EM iterations. In an orthogonal approach, we integrate prosody into grammar more explicitly using a model that jointly observes words and associated prosody. We evaluate the two methods by parsing speech data from the Switchboard corpus. The results are compared against baseline results from a model that does not use prosody. The experiments show that prosody improves a grammar in terms of accuracy as well as the parsimonious use of parameters.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Parsing low-resource languages using Gibbs sampling for PCFGs with latent annotations

PCFGs with latent annotations have been shown to be a very effective model for phrase structure parsing. We present a Bayesian model and algorithms based on a Gibbs sampler for parsing with a grammar with latent annotations. For PCFG-LA, we present an additional Gibbs sampler algorithm to learn annotations from training data, which are parse trees with coarse (unannotated) symbols. We show that...

متن کامل

Training Factored PCFGs with Expectation Propagation

PCFGs can grow exponentially as additional annotations are added to an initially simple base grammar. We present an approach where multiple annotations coexist, but in a factored manner that avoids this combinatorial explosion. Our method works with linguisticallymotivated annotations, induced latent structure, lexicalization, or any mix of the three. We use a structured expectation propagation...

متن کامل

PCFGs, Topic Models, Adaptor Grammars and Learning Topical Collocations and the Structure of Proper Names

This paper establishes a connection between two apparently very different kinds of probabilistic models. Latent Dirichlet Allocation (LDA) models are used as “topic models” to produce a lowdimensional representation of documents, while Probabilistic Context-Free Grammars (PCFGs) define distributions over trees. The paper begins by showing that LDA topic models can be viewed as a special kind of...

متن کامل

Transfer of Grammatical Structure

Recent research has demonstrated that PCFGs with latent annotations are an effective way to provide automated increases in parsing accuracy. We feel that they have more potential than the literature has so far demonstrated, and we further speculate that they could also provide transfer between different corpora. In this paper, we describe our efforts and show that they do indeed provide a signi...

متن کامل

Head-Driven PCFGs with Latent-Head Statistics

Although state-of-the-art parsers for natural language are lexicalized, it was recently shown that an accurate unlexicalized parser for the Penn tree-bank can be simply read off a manually refined treebank. While lexicalized parsers often suffer from sparse data, manual mark-up is costly and largely based on individual linguistic intuition. Thus, across domains, languages, and tree-bank annotat...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007